home *** CD-ROM | disk | FTP | other *** search
Text File | 1997-03-06 | 11.3 KB | 297 lines | [TEXT/ttxt] |
- Service C++ functions and classes
- dealing mostly with "advanced" i/o and the arithmetic compression
-
- ***** For the version history, read on
-
- ***** Comments/questions/problem reports/etc
- are all very welcome. Please send them to me at
- oleg@pobox.com or oleg@acm.org
-
- ***** Platforms
- I have personally compiled and tested the C++ advanced
- iostream libraries on the following platforms:
- SunSparc20/Solaris 2.4, gcc 2.7.2, libg++ 2.7.1
- SunSparc20/Solaris 2.3, SunPro C++ compiler
- HP 9000/{750,770,712}, HP/UX 9.0.5, 9.0.7 and 10.0,
- gcc 2.7.2, libg++ 2.7.2
- PowerMac 7100/80, 8500/132,
- Metrowerk's CodeWarrior C++, v. 7 - 11
- Intel, Windows95, Borland C++ 4.5/5.0
- (the binaries then ran under Windows NT 4.0 beta)
- I know that the packages also work on DEC Alpha, FreeBSD,
- and Concurrent Maxion 8000/RTU 6.2V25 (all with gcc 2.7.2 compiler)
-
- ***** Verification files: vmyenv, vendian_io, vhistogram, varithm
-
- Don't forget to compile and run them, see comments in the Makefile for
- details. The verification code checks to see that all the functions
- in this package have compiled and run well. The code also can serve as
- an example how package's classes/functions can be used
-
-
- ***** Highlights and idioms
-
- ---- Extended file names
-
- The package adds support for "extended" file names with pipes in them.
- That is, the name of a file to open may be specified now as "|
- command" or "command |" i.e. as a pipe. For example,
- EndianIn istream;
- istream.open("gunzip < /tmp/aa.gz |");
- EndianOut stream("| compress > /tmp/aa.Z");
- image.write_pgm("| xv -");
- The <command> is launched in a subprocess through '/bin/sh' with its
- standard input/output hooked, through pipe(), to the file being
- opened.
-
- This extension is implemented on the lowest possible level, right
- before the request to open a file goes to OS (through the system call
- open(2)). A function sys_open() (in the source file sys_open.cc) acts
- as a "patch": that is, if you call sys_open() instead of open() to
- open a file, you get all the open() functionality plus the extended
- file names.
-
- Thus, some libg++ 2.7.2 iostream functions were modified to call
- sys_open() instead of open(). If one wants to use the extended file
- names outside gcc/libg++, he needs to do open->sys_open substitution
- himself.
-
-
- ---- Explicit Endian I/O of short/long integers
-
- EndianOut stream("/tmp/aa");
- stream.set_littlendian();
- stream.write_long(1);
-
- That means, 1 would be written as a long integer with the least
- significant byte first, NO MATTER which computer (computer
- architecture) the code is running on. Using explicit endian
- specification (like above) is the only way to ensure portability of
- binary files containing arithmetic data.
-
-
- ---- Stream sharing
-
- EndianIn/Out streams can share the same i/o buffer. This is useful
- when one needs to read/write a "stratified" (layered) file consisting
- of various variable-bit encoded data interspersed with headers. For
- example, a file may begin with a header (telling the total number of
- data items, normalization factors) followed by some variable-bit
- encoding of items, followed by another header, followed by an
- arithmetic compressed stream of data, etc. Thus, a file can be like a
- waffle pie, made of many layers: each of them being interpreted using
- different streams, each of them collectively sharing the same file and
- the same file pointer. The situation is similar to sharing an open
- file (and a file pointer) among parent and child (forked) processes.
-
- Note that merely opening a stream on a dup()-ed file handle, or
- sync()-ing the stream doesn't cut it entirely. See endian_io.cc for
- more discussion. The bottom line is, this package implements stream
- sharing in a safe and portable way: it works on a Mac just as well as
- on different flavors of UNIX.
-
- ---- Simple variable-length coding of short integers
-
- The code is intended for writing a collection of short integers where
- many of them are rather small in value; still, big values can crop up
- at times, so we can't limit the size of the code to anything less than
- 16 bits. The code is a variation of a start-stop code described in
- Appendix A, "Variable-length representations of the integers" of the
- "Text Compression" book by T.Bell, J.Cleary and I.Witten,
- p.290-295. The present code features support for both negative and
- positive numbers and an optimization based on the fact that all
- numbers are no larger than 2^15-1 in abs value, and an assumption that
- most of them are smaller than 512 (in absolute value).
-
-
- ---- Arithmetic compression of a stream of integers
-
- The present package provides a clean C++ implementation of Bell,
- Cleary and Witten's arithmetic compression code, with a clear
- separation between a model and the coder. ArithmCodingIn /
- ArithmCodingOut act as i/o streams that encode signed short integers
- you put() to, and decode them when you get() them. The
- ArithmCodingIn/Out object needs a "plug-in" of a class
- Input_Data_Model when the stream is created. The Input_Data_Model
- object is responsible for providing the codec with the probabilities
- (frequencies) a given data item is expected to appear with, and for
- finding a symbol given its cumulative frequency. Input_Data_Model may
- also modify itself to account for a new symbol. Thus, the ArithmCoding
- class is a sort of the 'iostream' class that writes/reads data items
- to/from the stream performing encoding/decoding. It relies upon the
- Input_Data_Model for the probabilities needed to perform the
- arithmetic coding.
-
- The current version of the package provides two Input_Data_Model
- plug-ins, both performing adaptive "modeling" of a stream of
- integers. The first plug-in uses a simple 0-order adaptive prediction
- (like the model given in the Witten's book). The other one takes a
- histogram to sketch the initial distribution, and is a bit
- sophisticated in updating the model. It is used in compressing a
- wavelet decomposition of an image. The code below (taken literally
- from varithm.cc) demonstrates how the coder classes are actually used.
-
- The first example writes two different streams (of different patterns,
- that's why it was better to encode them separately) into the same file
-
- EndianOut stream("/tmp/aa");
- stream.set_littlendian();
- const int sample_header = 12345;
- {
- AdaptiveModel model(-1,4);
- ArithmCodingOut ac(model);
- ac.open(stream);
- for(i=0; i<sizeof(pattern1)/sizeof(pattern1[0]); i++)
- ac.put(pattern1[i]);
- }
- {
- stream.write_long(sample_header); // write a "header"
- AdaptiveModel model(-1,4); // followed by the arithmetic coded
- ArithmCodingOut ac(model); // stream
- ac.open(stream);
- for(i=0; i<sizeof(pattern2)/sizeof(pattern2[0]); i++)
- ac.put(pattern2[i]);
- }
- stream.close();
-
- The reading is similar.
-
- The second example uses a different model plug-in, yet i/o is similar
-
- static void test_adh(void)
- {
- message("\nCreating Histogram ...\n");
- Histogram histogram(-7,7);
- register int i;
- for(i=0; i<MyPattern_size; i++)
- histogram.put(MyPattern[i]);
-
- message("\nWriting data ...");
- AdaptiveHistModel model(histogram);
- ArithmCodingOut ac(model);
- ac.open("/tmp/aa");
- for(i=0; i<MyPattern_size; i++)
- ac.put(MyPattern[i]);
- ac.close();
-
- message("\nCoded file /tmp/aa has been created\n");
-
- AdaptiveHistModel i_model;
- ArithmCodingIn ac1(i_model);
- ac1.open("/tmp/aa");
- for(i=0; i<MyPattern_size; i++)
- {
- register int val_read = ac1.get();
- if( val_read != MyPattern[i] )
- _error("Read value %d of the %d-th integer is not what it is "
- "supposed to be, %d",
- val_read, i, MyPattern[i]);
- }
- ac1.get();
- assert( ac1.is_eof() );
- }
-
- ---- Convenience Functions
-
- The package defines a few functions I found convenient to use, like
- message(...) (which is equivalent to fprintf(stderr,....)) and
- _error(...) ( the same as message(...), abort();). One doesn't need to
- to #include <stdio.h> to use them.
-
- Also included:
- xgetenv() - getenv() with a fall-back clause
- get_file_size() - also with a default clause
- does_start_with_ci() - an amazingly useful function in input parsing
- see vmyenv.cc for examples of their usage.
-
- The validation file vmyenv.cc also illustrates how to catch an abort
- condition, without crashing the main process (macro
- must_have_failed())
-
-
- ---- Portability Tips
-
- Borland C++ 4.5 is sometimes unhappy with the order BitIn, BitOut (in
- endian_io.h) and ArithmCodingIn, ArithmCodingOut (in arithm.h) classes
- are derived. Right now,
- class BitIn : BitIOBuffer, public EndianIn
- upsets BC because "RTTI class BitIn being derived from non-RTTI class
- BitIOBuffer". I have a hunch that the error like that could be avoided
- by tinkering with C++ compiler options. On the other hand, merely
- switching the order of inheritance,
- class BitIn : public EndianIn, BitIOBuffer
- solves the problem. The same for BitOut, ArithmCodingIn, and
- ArithmCodingOut.
-
-
-
- ***** Grand plans
-
- ***** Revision history
-
- Version 2.3 - Mar 1997
- - added xgetenv(), does_start_with_ci(), get_file_size()
- - created vmyenv.cc to validate myenv.h's functions
- - a few adjustments (mainly to endian_io.h and arithm.h)
- to account for changes in implementation (and interfaces,
- <sigh>) of the C++ iostream library, made in new versions
- of libg++ (v. 2.7.2) and Metrowerk's CodeWarrior (v. 11)
- This brings c++advio closer to the (ever evolving) C++ standard.
- - _Vocabulary_ (an embedded language, actually) is now
- distributed with the c++advio, see voc.h for more detail.
-
- Version 2.2.3 - Mar 1996
- - sys_open.cc now accepts an input pipe with more than one link
- as a "file" name
- - endian_io.*: added EndianIOData:unshare() method to break
- sharing of a streambuffer (if was any). This method is intended
- for destructors only (makes the code more portable).
- - careful attention to comparisons between signed and unsigned
- (mainly to get gcc 2.7.2 to shut up)
- - now everything compiles with gcc 2.7.2/libg++ 2.7.1 and
- Metrowerks Codewarrior 8.
- - portability tweaks in myenv.h (declaring bool for platforms
- that lack it)
- - arithm_modadh.*: more logical (and efficient) way of "pulling-to-
- the-front" when updating adaptive model frequency counters
- by more than 1. Also, the initial distribution is slightly
- tweaked. The upshot is that the compression is a tiny bit
- better (at least, the algorithm makes more sense).
-
- Version 2.2.1 - Jun 1995
- Fixed the last remaining incompatibility glitches. Now, exactly the
- same code compiles on a Mac with CodeWarrior 6 and on Unix with gcc
- 2.6.3
-
- Version 2.2 - May 1995
- Added a variable-length (start/stop) coding of signed short integers.
- Added dealing with simple histograms of an integer-valued
- distribution.
-
- Version 2.1 - Mar 1995
- Introducing bool where appropriate (instead of int) and adding checks
- to make sure an EndianIn/Out stream was opened successfully.
-
- Version 2.0 - Feb 1995
- Big change: splitting EndianIO into EndianIn and EndianOut and
- removing all libg++-specific things; everything should be very
- portable now. Making sharing of the streambuffer portable.
-
- Version 1.4 - Feb 1994
- Updated for libg++ 2.5.3
-
- Version 1.3 - Aug 1993
- Introducing attachment of one stream to another, or sharing of a
- streambuf among several streams. Took care of properly terminating an
- arithm coding stream by writing a few phony bits at the end (so we
- won't hit the EOF on reading). Thus it is possible now to concatenate
- arithmetic coding streams.
-
- Version 1.2 - Jun 1992
- Updated to compile under gcc/g++ 2.2.1 and work with libg++ 2.0. The
- first implementation of the arithmetic coding package
-
- Version 1.1 - Nov 1991 - May 1992
- Initial revision
-
-